Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add field type to query shape and extensive unit tests #140

Merged

Conversation

deshsidd
Copy link
Collaborator

@deshsidd deshsidd commented Oct 10, 2024

Description

Add field type to query shape and integrate with the setting changes for field_name and field_type: #135

Example

Field Mapping:

{
    "index1": {
        "mappings": {
            "properties": {
                "field1": { "type": "keyword" },
                "field2": { "type": "text" },
                "field3": { "type": "text" },
                "field4": { "type": "long" }
            }
        }
    }
}

Query:

{
    "query": {
        "bool": {
            "must": [
                {
                    "term": {
                        "field1": "example_value"
                    }
                }
            ],
            "filter": [
                {
                    "match": {
                        "field2": "search_text"
                    }
                },
                {
                    "range": {
                        "field4": {
                            "gte": 1,
                            "lte": 100
                        }
                    }
                }
            ],
            "should": [
                {
                    "regexp": {
                        "field3": ".*"
                    }
                }
            ]
        }
    }
}

Query Shape:

bool []
  must:
    term [field1, keyword]
  filter:
    match [field2, text]
    range [field4, long]
  should:
    regexp [field3, text]

Local Sanity Testing Logs

  1. Tested all combinations of field_type and field_name enabled/disabled
  2. Tested multiple types of search queries including complex queries with aggregation, sort, pipeline agg
  3. Tested invalid fields not contained in the mapping
[2024-10-09T18:33:16,662][INFO ][o.o.p.i.c.l.QueryInsightsListener] [integTest-0] Query Shape: match [name, text]

[2024-10-09T18:33:18,673][INFO ][o.o.p.i.c.l.QueryInsightsListener] [integTest-0] Query Shape: term [email, keyword]

[2024-10-09T18:33:20,721][INFO ][o.o.p.i.c.l.QueryInsightsListener] [integTest-0] Query Shape: range [age, integer]

[2024-10-09T18:33:23,045][INFO ][o.o.p.i.c.l.QueryInsightsListener] [integTest-0] Query Shape: 
bool []
  must:
    match [name, text]
    range [age, integer]

[2024-10-09T18:33:25,295][INFO ][o.o.p.i.c.l.QueryInsightsListener] [integTest-0] Query Shape: 
wildcard [email, keyword]

[2024-10-09T18:33:27,630][INFO ][o.o.p.i.c.l.QueryInsightsListener] [integTest-0] Query Shape: 
aggregation:
  avg [age, integer]

[2024-10-09T18:33:30,132][INFO ][o.o.p.i.c.l.QueryInsightsListener] [integTest-0] Query Shape: 
nested []
  must:
    match [address.city]

[2024-10-09T18:35:43,292][INFO ][o.o.p.i.c.l.QueryInsightsListener] [integTest-0] Query Shape: 
bool []
  must:
    match [name, text]
  filter:
    range [age, integer]
aggregation:
  terms [age, integer]
    aggregation:
      max [join_date, date]
      pipeline aggregation:
        bucket_sort
sort:
  desc [join_date, date]

Updating [search.insights.top_queries.grouping.attributes.field_type] from [true] to [false]

[2024-10-09T18:39:14,158][INFO ][o.o.p.i.c.l.QueryInsightsListener] [integTest-0] Query Shape: 
bool []
  must:
    match [name]
  filter:
    range [age]
aggregation:
  terms [age]
    aggregation:
      max [join_date]
      pipeline aggregation:
        bucket_sort
sort:
  desc [join_date]

[2024-10-09T18:39:39,482][INFO ][o.o.c.s.ClusterSettings  ] [integTest-0] updating [search.insights.top_queries.grouping.attributes.field_name] from [true] to [false]
[2024-10-09T18:39:39,482][INFO ][o.o.c.s.ClusterSettings  ] [integTest-0] updating [search.insights.top_queries.grouping.attributes.field_type] from [false] to [true]
[2024-10-09T18:39:42,616][INFO ][o.o.p.i.c.l.QueryInsightsListener] [integTest-0] Query Shape: bool []
  must:
    match [text]
  filter:
    range [integer]
aggregation:
  terms [integer]
    aggregation:
      max [date]
      pipeline aggregation:
        bucket_sort
sort:
  desc [date]

[2024-10-09T18:39:48,314][INFO ][o.o.c.s.ClusterSettings  ] [integTest-0] updating [search.insights.top_queries.grouping.attributes.field_type] from [true] to [false]
[2024-10-09T18:39:50,451][INFO ][o.o.p.i.c.l.QueryInsightsListener] [integTest-0] Query Shape: bool
  must:
    match
  filter:
    range
aggregation:
  terms
    aggregation:
      max
      pipeline aggregation:
        bucket_sort
sort:
  desc

resolves: #69

Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com>
@deshsidd deshsidd force-pushed the sid/query-shape-field-type branch from 52dab34 to 5239f68 Compare October 10, 2024 00:23
@deshsidd deshsidd changed the title Add query type to query shape and extensive unit tests Add field type to query shape and extensive unit tests Oct 10, 2024
@deshsidd deshsidd force-pushed the sid/query-shape-field-type branch 2 times, most recently from dda1aa4 to f05c5d2 Compare October 10, 2024 01:50
Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com>
@deshsidd deshsidd force-pushed the sid/query-shape-field-type branch from f05c5d2 to eb2877d Compare October 10, 2024 01:51
Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com>
Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com>
Copy link
Collaborator

@jainankitk jainankitk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks much better now. Minor comments to improve the readability

Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com>
@jainankitk jainankitk merged commit dcf52c2 into opensearch-project:main Oct 10, 2024
18 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Oct 10, 2024
* Add query type to shape and extensive unit tests

Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com>

* Log query shape if trace log enabled

Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com>

* Refactor code, handle multifield with test, remove properties cache

Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com>

* Use only first index and optimization to get properties once per query

Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com>

* Further refactoring

Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com>

---------

Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com>
(cherry picked from commit dcf52c2)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@opensearch-trigger-bot
Copy link

The backport to 2.17 failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/query-insights/backport-2.17 2.17
# Navigate to the new working tree
pushd ../.worktrees/query-insights/backport-2.17
# Create a new branch
git switch --create backport/backport-140-to-2.17
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 dcf52c28a6bd7ed169c71fd8c006e56db7b73f13
# Push it to GitHub
git push --set-upstream origin backport/backport-140-to-2.17
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/query-insights/backport-2.17

Then, create a pull request where the base branch is 2.17 and the compare/head branch is backport/backport-140-to-2.17.

deshsidd pushed a commit that referenced this pull request Oct 10, 2024
* Add query type to shape and extensive unit tests



* Log query shape if trace log enabled



* Refactor code, handle multifield with test, remove properties cache



* Use only first index and optimization to get properties once per query



* Further refactoring



---------


(cherry picked from commit dcf52c2)

Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@ansjcy
Copy link
Member

ansjcy commented Oct 10, 2024

@deshsidd don't backport this to 2.17, this is for open source 2.18.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] [RFC] Query Shape Field Data Type
3 participants